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Abstract 

Background: Pelgipeptin, a potent antibacterial and antifungal agent, is a non-ribosomally synthesised lipopeptide 
antibiotic. This compound consists of a (3-hydroxy fatty acid and nine amino acids. To date, there is no information 
about its biosynthetic pathway. 

Results: A potential pelgipeptin synthetase gene cluster (pip) was identified from Paenibacillus elgii B69 through 
genome analysis. The gene cluster spans 40.8 kb with eight open reading frames. Among the genes in this cluster, 
three large genes, pIpD, pIpE, and pIpF, were shown to encode non-ribosomal peptide synthetases (NRPSs), with one, 
seven, and one module(s), respectively. Bioinformatic analysis of the substrate specificity of all nine adenylation 
domains indicated that the sequence of the NRPS modules is well collinear with the order of amino acids in 
pelgipeptin. Additional biochemical analysis of four recombinant adenylation domains (PIpD A1, PIpE Al, PIpE A3, 
and PIpF Al) provided further evidence that the pip gene cluster involved in pelgipeptin biosynthesis. 

Conclusions: In this study, a gene cluster (pip) responsible for the biosynthesis of pelgipeptin was identified from the 
genome sequence of Paenibacillus elgii B69. The identification of the pip gene cluster provides an opportunity to 
develop novel lipopeptide antibiotics by genetic engineering. 
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Background 

The intensive use of chemical pesticides to treat plant 
diseases has resulted in various problems such as severe 
environmental pollution, food safety concerns, and 
emergence of drug resistance. Biological control using 
microorganisms or their metabolites, a more rational 
and safer method, has emerged as a promising alterna- 
tive to suppress plant pathogens and reduce the use of 
agrochemicals [1,2]. Pelgipeptins, a group of natural ac- 
tive compounds isolated from Paenibacillus elgii B69, 
are potential biological control agents [1]. This group of 
antibiotics has a general structure composed of a cyclic 
nonapeptide moiety and a (3-hydroxy fatty acid. Four 
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analogues of pelgipeptin have been identified and char- 
acterised [3]. These analogues are highly similar in 
structure and differ only in one amino acid unit or in 
the lipid acid (Figure 1A). Pelgipeptin exhibits broad- 
spectrum antimicrobial activity against pathogenic 
bacteria and fungi, including Staphylococcus aureus, En- 
terococcus faecalis, Escherichia coli, Candida albicans, 
Fusarium oxysporum, F. graminearum, F. moniliforme, 
Rhizoctonia solani, and Colletotrichum lini [1,3]- This 
compound effectively inhibited the development of 
sheath blight caused by R. solani on rice in a preliminary 
evaluation of the in vivo efficacy of pelgipeptin [1]. 

Similar to polymyxin and fusaricidin from P. poly- 
myxa, pelgipeptin, containing non-proteinogenic and D- 
amino acids, must be synthesised by a non-ribosomal 
peptide synthetase (NRPS). NRPS is a large multifunc- 
tional enzyme that has modular structures [4]. Each 
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Figure 1 Pelgipeptin and the genes responsible for its biosynthesis. (A) Primary structure of pelgipeptin. (B) The pip gene cluster and 
domain organisation of the NRPS. 



NRPS module catalyses the incorporation of a specific 
substrate into the growing product. A typical module 
consists of three enzymatic domains, namely, adenyla- 
tion (A), thiolation (T; also known as peptidyl carrier 
protein), and condensation (C) domains. The A domain 
selects and activates a specific amino acid substrate, the 
T domain is responsible for tethering the activated sub- 
strate to the 4/-phosphopanthetheinyl cofactor, and the 
C domain catalyses peptide bond formation between the 
elongating peptide and a new amino acid. In addition to 
these core domains, the terminal thioesterase (TE) and 
epimerisation (E) domains, as well as several other tai- 
loring domains, may also be present in NRPS modules. 
The order of modules of an NRPS is, in many cases, col- 
linear to the amino acid sequence of the corresponding 
peptide product. The collinearity rule of NRPS systems 
combined with knowledge of the specificity-conferring 
code of A domain allow for the prediction and amino 
acid modification of peptide fragments synthesised by 
corresponding NRPS [5]. However, few NRPS sequences 
have been extensively described in comparison with the 
number of known peptide products, limiting the study 



of the principles of non-ribosomal peptide synthesis and 
the development of new bioactive peptides by genetic 
engineering. In this study, we identified and analysed a 
gene cluster involved in the biosynthesis of pelgipeptin 
and provided biochemical data for the collinearity of this 
peptide assembly line. 

Methods 

Bacteria strains and culture conditions 

P. elgii B69, isolated from a soil sample [1], was cultured 
in nutrient broth. E. coli DH5ct, for gene manipulation, 
and E. coli BL21 (DE3), for overexpression of recombin- 
ant proteins, were cultivated on Luria-Bertani medium. 

Identification and in silico analysis of pip gene cluster in 
P. elgii B69 

The draft genome sequence of the strain was used to 
build a database in Bioedit to identify the putative NRPS 
genes in P. elgii B69 (http://www.mbio.ncsu.edu/BioEdit/ 
bioedit.html). The first and second C domains of PmxE 
(GenBank EU371992), which is a polymyxin synthetase 
subunit, were compared with the created database using 
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local BLAST searches [6] as implemented in Bioedit. 
Amino acid sequence homology searches were per- 
formed using the BLAST server at the National Centre 
for Biotechnology Information (NCBI, http://www.ncbi. 
nlm.nih.gov/BLAST/) site. NRPS domains were identi- 
fied by PKS/NRPS analysis (http://nrps.igs.umaryland. 
edu/nrps/) [7]. Prediction of 10 amino acids located at 
the substrate-binding pocket of the A domain and sub- 
strate specificity prediction were performed using the 
web-based program NRPS predictor (http://ab.inf.uni- 
tuebingen.de/software/NRPSpredictor/) [8] . 

Cloning, expression and purification of A domains 

We synthesised four sets of specific forward and reverse 
primers: plpD-Al-F (CTAG CCATGG AAAACATTTTG- 
ACCCG) and plpD-Al-R (CAC CTCGAG TTCG- 
TACTCCGCTCCG); plpE-Al-F (GACA CCATGG ATT- 
TGTTGTCGGAAG) and plpE-Al-R ( ATC CTCG AG C- 
ACGAACTCCACGCCGGTT); and plpE-A3-F (CTAG- 
CCATGG CGGCGGAGCAGACAC) and plpE-A3-R 
(CCC AAGCTT CGCGACGTAGTCGGCTC); and plpF- 
Al-F (CTA GCTAGC TTGTCCGACTCCGAG) and 
plpF-Al-R (GC GGATCC TCACTCCAGTCCGGTCT) to 
amplify the A domains of PlpD Al, PlpE Al, PlpE A3, 
and PlpF Al. The genes encoding these A domains were 
PCR-amplified from the genomic DNA of P. elgii B69 
and cloned into pET28a vector. The recombinant plas- 
mid was transformed into E. coli DH5a for gene manipu- 
lation. After transformation into E. coli BL21 (DE3), the 
recombinant proteins were overexpressed and produced 
as described previously [9]. 

BL21 strains expressing each A domain were grown 
in Luria-Bertani medium supplemented with 50 ug/ml 
kanamycin at 37 °C until its OD 60 o reached about 0.5. 
Gene expression was induced by 0.1 mM isopropyl-b-D- 
thiogalactopyranoside at 30 °C for 4 h. Cells were har- 
vested by centrifugation, resuspended in buffer A (40 mM 
Tris-HCl, 200 mM NaCl, 20 mM imidazole, pH 8.0), and 
lysed by sonication on ice. The lysates were centrifuged at 
12 000 g for 30 min at 4 °C, and the supernatants were 
loaded onto a Ni Sepharose 6 FF (GE Healthcare) column. 
The column was washed with five bed volumes of buffer 
A, followed by five bed volumes of buffer B (40 mM Tris- 
HCl, 200 mM NaCl, 60 mM imidazole, pH 8.0). The re- 
combinant proteins were then eluted by buffer C (40 mM 
Tris-HCl, 200 mM NaCl, 150 mM imidazole, pH 8.0). 
Purified proteins were detected by 10 % sodium dodecyl 
sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) 
and dialysed against buffer D (40 mM Tris-HCl, pH 8.0, 
200 mM NaCl, and 1 mM dithiothreitol). Protein concen- 
tration was determined by the bicinchoninic acid protein 
assay (Pierce, USA) using bovine serum albumin as the 
standard. 



Determination of substrate specificity 

The substrate selectivity of each of the A domains was 
determined using a non-radioactive assay [10]. The re- 
action mixture (40 ul) contained 0.5 uM recombinant 
A domain, 0.2 U/ml inorganic pyrophosphatase, 5 mM 
ATP, 100 mM NaCl, 10 mM MgCl 2 , and 6 mM amino 
acid in 50 mM Tris-HCl (pH 7.5). Reactions were 
started by the addition of ATP and incubated at 25 °C. 
The reactions were terminated by the addition of mo- 
lybdate/malachite green reagent. After 15 min of colour 
development, optical density was measured at 600 nm 
on a microplate reader (Multiscan MK3, Thermo Elec- 
tron Co. Ltd., Shanghai, China). A reaction mixture 
lacking the recombinant A domain was used as a nega- 
tive control. 

Nucleotide sequence accession numbers 

The DNA sequences for the pelgipeptin biosynthetic 
gene cluster in P. elgii B69 was deposited in the Gen- 
Bank under accession number JQ745271. 

Results and discussion 

Analysis of the organisation of the pip gene cluster and 
its flanking regions 

We recently completed the draft genome sequence of 
the pelgipeptin-producing bacterium P. elgii B69, in 
which at least 5 NRPS-related biosynthetic gene clusters 
were found within its 7,981,270 bp long scaffold [11]. 
Further inspection revealed that several NRPS genes 
located in scaffolds 3 and 43 were probably related with 
pelgipeptin biosynthesis. The gaps between and within 
these two scaffolds were filled by sequencing PCR pro- 
ducts. These efforts resulted in a complete NRPS gene 
cluster (pip), harbouring eight open reading frames 
(ORFs), which could be assigned to pelgipeptin biosyn- 
thesis. These ORFs (designated plpA-plpH) were tran- 
scribed in the same direction (Figure IB). Upstream of 
the pip locus, two genes (ORF2 and ORF3) encoding 
proteins with similarities to heparinase II/III family pro- 
teins (YP_003243728 and YP_003243727, respectively) 
were transcribed in the same direction and were consid- 
ered not to be involved in pelgipeptin production. Fur- 
ther upstream, a third ORF (ORF1), with TGA stop 
codon within ORF2, was found to encode a protein with 
high similarity to short-chain dehydrogenases/reductases 
(ZP 08509633) and was also considered not involved in 
the pelgipeptin biosynthesis. Downstream of the plpF 
gene, four genes encoding putative ABC transporter pro- 
teins were found. PlpG and PlpH, shared 72% and 69% 
identities with PmxC and PmxD, respectively, which 
were considered responsible for the secretion of poly- 
myxin produced by P. polymyxa [12]. This transport ac- 
tivity may be needed for the transport of pelgipeptin out 
of the cell, and therefore, the gene products were 
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attributed to pelgipeptin biosynthesis. The other two 
genes (ORF4 and ORF5) encoding putative nitrate/ 
sulphonate/bicarbonate ABC transporter proteins were 
transcribed in the opposite direction and were consid- 
ered less likely to be involved in pelgipeptin production, 
although further evidence will be required before this 
can be decided unequivocally. The putative ORFs and 
the genetic organisation of the chromosomal region con- 
taining these sequences are depicted in Figure IB. 

Genes encoding NRPS 

As shown in Figure IB, three NRPS genes, plpD, plpE, and 
plpF, are present in the pip cluster, and these genes encode 
proteins with estimated molecular masses of 171.8, 951.3, 
and 122.9 kDa, respectively. The modules and domains of 
pelgipeptin synthetase were analysed as described in the 
"Materials and methods" section above. PlpD, containing 
four domains (C-A-T-C) (Figure IB), had an N-terminal C 
domain, which shared 43% identity with the starter C do- 
main of PmxE [12]. The amino acid predicted specific for 
the A domain of PlpD was 2,4-diaminobutyric acid (Dab) 
(Table 1). The presence of a starter C domain in PlpD, and 
the specificity of the module for Dab are both consistent 
with this module providing the first amino acid of the pel- 
gipeptin peptide, and therefore the fatty acid side chain 
should be connected to the peptide at this residue [13]. 
PlpE, containing 23 domains (A-T-C-A-T-C-A-T-E-C-A- 
T-C-A-T-C-A-T-E-C-A-T-C), comprises seven modules 
and a C-domain. The substrate specificities of the seven 
PlpE A domains were predicted to activate the amino 
acids He, Dab, Phe, Leu, Dab, Val, and Leu, respectively. 
Two modules contain an epimerisation domain, indicating 
that the related activated amino acids (Phe and Val) may 
be converted into the D-configuration. Three domains (A- 
T-TE) were present in PlpF, and the predicted amino acid 
specific for the A domain was Ser. The last domain of this 
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megasynthase was a thioesterase domain, indicating that 
PlpF may be required for the release and cyclisation of the 
synthesised lipopeptides. These results indicate that plpD 
is the first and plpF the last gene involved in pelgipeptin 
biosynthesis. Thus, the number of A domains, order of 
modules for amino acid assembly, and location of epimeri- 
sation domains perfectly correspond to the structural 
characteristics of pelgipeptin (Figure 1), suggesting that 
the pip gene cluster may be responsible for the synthesis 
of pelgipeptin in the B69 strain. 

In vitro assay of adenylation domains 

The substrate specificity of four A domains, PlpD Al, 
PlpE Al, PlpE A3, and PlpF Al were determined 
through a non-radioactive assay to link further the pip 
gene cluster to pelgipeptin synthesis. The reason for our 
selection of PlpD Al, PlpE A3, and PlpF Al was that 
their predicted products (Dab, Phe, and Ser, respectively) 
were characteristic amino acids of pelgipeptin. The pre- 
dicted product of Pip E Al was He, but the correspond- 
ing amino acid (position 2) in pelgipeptin was variable 
(He or Val). This is the reason for our selection of PlpE 
Al. Recombinant A-domain proteins were expressed 
and purified as described in the "Materials and methods" 
section above. All proteins with satisfactory yield (about 
10 mg/L of culture) and purity (>95%) were obtained in 
soluble form. The substrate selectivity of A domains was 
determined with the 20 proteinogenic amino acids plus 
L-Dab and D-Phe (Figure 2). PlpD Al, PlpE A3, and Pip 
F Al clearly exhibited the highest activity for L-Dab, 
L-Phe, and L-Ser, respectively. PlpE Al protein, however, 
was found to activate L-Val (100%), L-Leu (82%), and 
L-Ile (52%, the highest activity was set at 100%; back- 
ground was usually below 5%). Val or He is found in dif- 
ferent analogues of pelgipeptins at position 2 
(Figure 1A), whereas no analogue with Leu at this pos- 
ition was detected. This phenomena may be explained 
by the effect of the C domain of module 2 (Figure 1), be- 
cause in some cases, C domains also play an important 
role in substrate selectivity [4,14]. In general, the four 
tested recombinant A domains were found to activate 
selectively predicted amino acids, experimentally con- 
firming the speculation that the pip gene cluster 
involved in pelgipeptin biosynthesis. 

The plpA gene responsible for L-2,4-diaminobutyrate 
biosynthesis 

The peptide core of pelgipeptin contains three non- 
proteinogenic amino acid L-2,4-diaminobutyrate at posi- 
tions 1, 3, and 6. Several studies have indicated that this un- 
usual amino acid is formed from aspartate |3-semialdehyde 
catalysed by the enzyme diaminobutyrate-2-oxoglutarate 
transaminase [15,16]. The plpA gene encoded a putative 
homologue of this enzyme and was proposed to be 
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Figure 2 Substrate specificity of the A domains by non-radioactive assay. The assay was performed with 20 different proteogenic amino 
acids plus L-Dab and D-Phe. The highest activity was set at 100%. Only amino acids related to the composition of pelgipeptin are shown. Other 
amino acids with relative activities < 5% are not shown. 



responsible for L-2,4-diaminobutyrate biosynthesis in P. 
elgii B69. The deduced amino acid sequence of the plpA 
gene product (PlpA, 428 amino acids) showed 50% and 
38% identity with EctB from Halobacillus dabanensis [15] 
and PvdH from Pseudomonas aeruginosa [16], respectively. 
It has been demonstrated that an important substrate of 
diaminobutyrate-2-oxoglutarate transaminase was aspartate 
|3-semialdehyde, which was formed from aspartyl phosphate 
catalysed by aspartate-semialdehyde dehydrogenase [16]. 
Aspartate p-semialdehyde is also a metabolic precursor for 
several other amino acids, including lysine, threonine, iso- 
leucine, methionine, and diaminopimelate. Therefore, the 
addition of these amino acids to the culture may be 
favourable to the strain for the synthesis of pelgipeptin, al- 
though most of these amino acids are not components of 
this lipopeptide antibiotic. This hypothesis is supported by 
a finding that the supplementation of a fermentation 
medium with amino acids listed above increased the pro- 
duction of pelgipeptin [3]. 

The plpB gene encoded a predicted extracellular lipolytic 
enzyme 

The deduced product of plpB gene was a putative lipase/ 
esterase with a typical secretory signal peptide, contain- 
ing three distinct domains, namely, an N domain with 
two positively charged lysine, a hydrophobic core do- 
main (H domain), and a C domain with the consensus 



sequence A-X-A at positions 23 to 25, which was a type 
I SPase cleavage site [17]. Cleavage at this site would 
give rise to a predicted mature protein (PlpB) with 495 
amino acids and a molecular mass of 53.8 kDa. A com- 
parison of the deduced amino acid sequence of PlpB 
with the sequence of lipase/esterase in the EMBL and 
SwissProt databases showed significant homology to the 
nucleophilic serine region of lipase/esterase, with 36% 
identity to LipB from Bacillus subtilis [18,19]. LipB pre- 
ferentially hydrolysed carboxyl esters of fatty acids with 
short chain lengths (less than 10 carbon atoms), indicat- 
ing that it was an esterase rather than a lipase. Similar to 
the extracellular lipolytic enzymes from the related 
genus Bacillus, Ala replaces the first Gly of the con- 
served Gly-X-Ser-X-Gly pentapeptide motif in PlpB [20]. 
Previous studies have reported that supplementing the 
fermentation medium with fatty acids of various chain 
lengths enhanced the biosynthesis of lipopeptides con- 
taining specific fatty acid side chains [21,22]. Thus, we 
speculated that the predicted extracellular lipase, PlpB, 
may facilitate the production of pelgipeptin through hy- 
drolysis of water-soluble carboxyl esters in cultures of 
strain B69. 

The pIpC gene encoded a predicted phosphopantetheinyl 
transferase 

The T domains of the PlpD-F must be converted from 
their inactive apo forms to cofactor-bearing holo forms 
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by a specific phosphopantetheinyl transferase via phos- 
phopantetheinylation of thiotemplates. The product of 
the plpC gene might be responsible for this conversion. 
The deduced protein (244 amino acids) encoded by plpC 
showed high similarity to Sfp from B. subtilis (38% iden- 
tity, 58% similarity), Gsp from B. brevis (37% identity, 
54% similarity), Psf-1 from B. pumilus (35% identity, 
55% similarity), and other phosphopantetheinyl trans- 
ferases associated with non-ribosomal peptide synthe- 
tases. Further analysis indicated that PlpC fell within the 
W/KEA subfamily of Sfp-like phosphopantetheinyl 
transferases, which is involved in many kinds of second- 
ary metabolite synthesis [23]. 

The N-terminal C domain 

The pip gene cluster contained a special C domain at the 
N terminus of PlpD (first C domain), in addition to eight 
typical C domains that presumably catalyzed peptide- 
bond formation between the adjacent amino acid resi- 
dues of pelgipeptin. Sequence alignments shown that this 
first C domain of PlpD had only 19-25% identity with the 
remaining eight C domains of PlpD, -E, and -F, but 
shared 31-43% identity with other first C domains of 
lipopeptide synthetases, such as NRPSs of surfactin [24], 
lichenysin [25], fengycin [26], fusaricidin [27] and poly- 
myxin [12]. In the initiation reaction of the biosynthesis 
of surfactin, module 1 of SrfA alone was sufficient to 
catalyze the transfer of (3-hydroxymyristoyl group to SrfA 
followed by formation of (3-hydroxymyristoyl-glutamate 
[28]. The recent study of Choi' group also suggested that 
only the N-terminal C domain of PmxE was necessary 
for the fatty acyl tailing of polymyxin [12]. Thus, in the 
initial step of pelgipeptin biosynthesis, the PlpD N- 
terminal C domain was proposed to catalyze the conden- 
sation of the first amino acid (Dab) with a p-hydroxy 
fatty acid transferred from coenzyme A. 

Conclusions 

In the present study, we identified a potential pelgipep- 
tin synthetase gene cluster (pip) in P. elgii B69 through 
genome analysis. The cluster spans 40.8 kb with three 
NRPS genes (plpD, plpE, and plpF). The determination 
of substrate specificity of four A domains, PlpD Al, PlpE 
Al, PlpE A3, and Pip F Al further linked the pip gene 
cluster to pelgipeptin synthesis. We failed to provide a 
final proof, which could have been obtained by con- 
structing a pelgipeptin-deficient mutant, after numerous 
attempts because this strain was hardly amicable to gen- 
etic manipulation. However, all the results mentioned 
above well supported the assignment of the pip gene 
cluster as the one responsible for the production of pel- 
gipeptin. Our results enrich the understanding of the en- 
zymatic action in lipopeptide biosynthesis and provide 
insight into the mechanism of natural product diversity. 
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